Unsupervised Text Recap Extraction for TV Series

نویسندگان

  • Hongliang Yu
  • Shikun Zhang
  • Louis-Philippe Morency
چکیده

Sequences found at the beginning of TV shows help the audience absorb the essence of previous episodes, and grab their attention with upcoming plots. In this paper, we propose a novel task, text recap extraction. Compared with conventional summarization, text recap extraction captures the duality of summarization and plot contingency between adjacent episodes. We present a new dataset, TVRecap, for text recap extraction on TV shows. We propose an unsupervised model that identifies text recaps based on plot descriptions. We introduce two contingency factors, concept coverage and sparse reconstruction, that encourage recaps to prompt the upcoming story development. We also propose a multi-view extension of our model which can incorporate dialogues and synopses. We conduct extensive experiments on TVRecap, and conclude that our model outperforms summarization approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised method for the acquisition of general language paraphrases for medical compounds

Medical information is widespread in modern society (e.g. scientific research, medical blogs, clinical documents, TV and radio broadcast, novels). Moreover, everybody’s life may be concerned with medical problems. However, the medical field conveys very specific and often opaque notions (e.g., myocardial infarction, cholecystectomy, abdominal strangulated hernia, galactose urine), that are diff...

متن کامل

Structural Linguistics and Unsupervised Information Extraction

A precondition for extracting information from large text corpora is discovering the information structures underlying the text. Progress in this direction is being made in the form of unsupervised information extraction (IE). We describe recent work in unsupervised relation extraction and compare its goals to those of grammar discovery for science sublanguages. We consider what this work on gr...

متن کامل

UPC System for the 2015 MediaEval Multimodal Person Discovery in Broadcast TV task

This paper describes a system to identify people in broadcast TV shows in a purely unsupervised manner. The system outputs the identity of people that appear, talk and can be identified by using information appearing in the show (in our case, text with person names). Three types of monomodal technologies are used: speech diarization, video diarization and text detection / named entity recogniti...

متن کامل

Multilingual Artificial Text Extraction and Script Identification from Video Images

This work presents a system for extraction and script identification of multilingual artificial text appearing in video images. As opposed to most of the existing text extraction systems which target textual occurrences in a particular script or language, we have proposed a generic multilingual text extraction system that relies on a combination of unsupervised and supervised techniques. The un...

متن کامل

An Overview of Open Information Extraction∗

Open Information Extraction (OIE) is a recent unsupervised strategy to extract great amounts of basic propositions (verb-based triples) from massive text corpora which scales to Web-size document collections. We will intoduce the main properties of this extraction method. 1998 ACM Subject Classification Dummy classification – please refer to http://www.acm.org/ about/class/ccs98-html

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016